Capstone Project

Applied Data Science Capstone by IBM/Coursera


London venues analysis: a case study for opening of new shops

Table of contents


1. Business Problem


An medium-sized coffee shop franchise - our client - is looking for three suited places in London to open new shops over there. The four new shops - three "standards" plus one flagship - should be

a. well spread over Greater London Area.\ b. in zones with a low concentration of competitor shops (competitor = franchise with a number of shops between 3 and 6 in whole Greater London Area).\ c. in zones with a proved business resilience (i.e. high business survival rate).\ d. in zones with good pop density.\ e. the flagship should be placed in a INNER LONDON borough.\ f. the other three shops can be placed each of them in one of the three clusters assessed in OUTER Areas.


2. Assignment


The client commissioned to our company a detailed analysis to find out four suited zones to open his shops by fulfilling the requirements listed in the Background.


3. Data and Methodology


Steps:\ I. The Greater London boroughs will be analyzed and clusterized by taking into account their geographical coordinates: particular focus will be placed on Business Survival Rate, Population Density and Competitors Shop Density.\ II. An appeal index by taking into account point b., c. and d. will be created, the index will be created by applying this weights to the major indexes defined:

FACTORS WEIGHTS
CoffeeShop Density per Ha 45%
Business Survival Index 25%
Population Density 30%

While Business Survival Rate and Population Density can be easily retrieved in the first link, Competitors Shop Density per borough will be assessed by using Foursquare to search for coffee shop in a 3.5 km radius from boroughs center.

Roughly, in the first step the London boroughs will be divided in three clusters while in the second step we will assess the three major indexes per each borough to support the decision where shops should be placed in: the decision will be addressed to the evaluation of an APPEAL INDEX calculated as weighted average of the three major indexes.

The APPEAL INDEX is meant as "the higher, the better": **the best performing borough per each cluster will be the winner**.


4. Analysis


4.a Import all needed Python Modules

4.b Import and Show London boroughs selected data

Statistical Data is loaded from Greater London Authority site: this data will provide info regarding

I. Boroughs code\ II. Borough Short Name\ III. Inland Area in Ha\ IV. Population Density per HA\ V. Average Age (2017)\ VI. Proportion of Population of Working Age (2015)\ VII. Two-Year Business Survival Rate (started in 2013)

it is worth to notice that for this analysis points V. and VI. will be neglected (included for further analysis).

4.c Search for and Show Geographic Coordinates of each borough

Let's integrate data coming from Greater London Authority with data collected by using OpenStreetMap: some extra columns - featuring bourough's center Latitude, Longitude and Extended Name - will added up. Also each borough boundaries will be saved for future usage.

Let's show the boroughs boundaries in a OpenStreetMap view: each circle represents the point used to assess the borough Latitude and Longitude.

4.d Create four different clusters: INNER LONDON + THREE CLUSTER in OUTER LONDON based on geo coordinates (k-Means)

Let's start to aggregate the OUTER LONDON boroughs in three clusters: each of them will host a shop.\ The cluster creation criterium is based on Geographical Positioning and it will be base on k-Means Algorithm.\ The fourth cluster is made up by INNER LONDON boroughs and it will host the flagship store.

4.e Create an extended dataset (Borough info and Geo Coordinates) and show the boroughs in a INTERACTIVE map

The cluster no of each borough will be integrated in the dataset, the columns of this dataset will be rearranged to make easier the readability.\ An ad-hoc GeoJson file will make possible to easily integrate and use Statistical Data on OpenStreetMap map.

<< Comments >>

Four areas have been isolated:

Area Boroughs No Cluster No Color
INNER LONDON 14 #0 Red
OUTER LONDON/WEST AREA 7 #1 Dark Blue
OUTER LONDON/NORTH AND EAST AREA 8 #2 Light Blue
OUTER LONDON/SOUTH AREA 4 #3 Cadet Blue

1) the INNER LONDON will host the flagship shop (14 Boroughs, Cluster #0, Red) 2) One shop will be placed in the OUTER LONDON/WEST AREA (7 Boroughs, Cluster #2, Dark Blue) 3) One shop will be placed in the OUTER LONDON/NORTH AND EAST AREA (8 Boroughs, Cluster #2, Light Blue) 4) One shop will be placed in the OUTER LONDON/SOUTH AREA (4 Boroughs, Cluster #3, Cadet Blue)

4.e Use Foursquare to retrieve information regarding Coffee Shops in each borough (3.5 km Radius)

Through the usage of Foursquare API, we will be able to count the number of coffee shops in each borough, the main criterium is a max distance of each borough center within 3.5 km.

4.f Show all franchise with more than three shops in Greater London Area

Some useful info regarding coffee shops in London: in the chart we have a snapshot regarding the most important franchise per number of shops (only companies with more than 3 shops are shown here).\ Our client is interested in the market segment of competitors with a number of shop in Greater London Area between 3 and 7: this dataset and pertaining stats are shown as a text in the bottom of this section.

4.g Create a joined dataframe featuring clustered boroughs and competitors venues in those boroughs

Boroughs stats are merged with Venue Stats to make easier the combined analysis required to set up the Appeal Index.\ A large dataset - featuring each venue stat paired with hosting borough stat - will be created.

4.h Count and Show the coffee shops in each boroughs and add this info to Extended Dataset

A shrinked dataset will be crated started from the previous larger one: each borough will report stats plus the number of competitor shop in the borough.\ A multi-feature Choroplet map will show these stats.

4.i Start to calculate and add to Geo Dataset normalized-to-one variables needed to calculate the appeal index

The Appeal Index is a weighted average of three different index:

  1. Shop Density per Ha
  2. Business Survival
  3. Population Density

to make the analysis homegeneous, each index is NORMALIZED in the range 0-1: 0 is a poor performance, 1 is the best one.

Shop Density per HA is an index meant as "the lower, the better", to make it consistent with the other two - made up with the logic "the higher, the better" - the complement to 1 is calculated.

4.l Use importance factors and normalized variables to calculate the appeal index

The three index are linearly combined with the factors to create the final Area Appeal Index that it will be used to make the final ranking.


5. Results: the appeal index and cluster membership per each borough


The borough appeal is shown here either as text and Choropleth Map.\ The Choropleth Map is multi-feature, the Appeal Index can be shown jointly with borough membership.


6. Conclusion and Final Comments


Generally, the INNER LONDON Boroughs look like more appealing than OUTER LONDON ones, one relevant exception is the City of London Borough with a very poor index due to the large number of shops in a very small area.\ The best area to place the shops - fullfilling requirements - are: